1 Article
1.1 Article Link
1.2 Summary of Article
This article focuses on how the majority of the work that a data scientist does in their day-to-day job is data engineering and cleaning. The author explains that many young data scientists (themself included) had aspirations of creating large and elaborate projects and visualizations using data but unfortunately they ended up doing the more mundane and less glamorous task of data cleaning. Much of the cirriculum that is available teaches us how to work with data that has already been cleaned and made easy to work with. Contrary to this the real-world is full of sensor data and other more ‘raw’ forms of data that need to be cleaned before they’re able to be worked with.
1.4 Image From Article
1.5 More Information
- The author explains that the majority of the work that a data scientist does in their day-to-day job is data engineering and cleaning.
- There is much more data cleaning than data science in the real world.
- Three main steps of data engineering: Extract, Transform, Load (ETL)